fix(CS-11647): hide manual approval UI for auto-executed commands by FadhlanR · Pull Request #5273 · cardstack/boxel

FadhlanR · 2026-06-18T07:47:57Z

Background and Goal

CS-11647: when the AI assistant sent a tool call the host was going to auto-execute, the manual approval UI flashed for ~100ms before auto-execution kicked in. Two surfaces show this UI and both have the same race:

The Accept All / Cancel bar at the bottom of the room.
The per-command Apply button (the Run / actionVerb button) rendered next to each tool-call message.

The bug isn't checkCorrectness-specific — it applies to every auto-execute branch: checkCorrectness, requiresApproval === false, or LLM mode 'act'. All three flow through the same debounced decision in command-service.drainCommandProcessingQueue.

Where to start

packages/host/app/lib/command-auto-execute.ts (new) — isAutoExecutableCommand(command, activeLLMMode, isOwnedByCurrentAgent), a pure predicate that returns true only when the host will actually auto-execute. Also re-homes CHECK_CORRECTNESS_COMMAND_NAME.
packages/host/app/services/command-service.ts — drainCommandProcessingQueue calls the helper instead of inlining the three-condition if-chain.
packages/host/app/components/matrix/room.gts — readyCommands filters out auto-executable commands so the Accept All bar can't paint in its manual-approval branch.
packages/host/app/components/matrix/room-message-command.gts — applyButtonState masks the per-command button into the applying spinner immediately for auto-executable commands, so the Run button never flashes.

Key decisions and non-obvious mechanics

One source of truth. The action bar, the per-command button, and command-service's drain loop all call the same predicate. They can't drift and reintroduce the flash.
Pure function, not a service method. The helper takes only what it needs (command, LLM mode, agent-ownership boolean) so callers compose it locally and it's trivially unit-testable.
Agent-ownership matters. Beyond the three auto-exec conditions, command-service refuses to auto-run a message that came from a different agent (message.agentId !== matrixService.agentId). The predicate has to know this — otherwise a requiresApproval=false command from another agent would mask into the spinner forever (caught the first time by an acceptance test, fixed in the latest commit).
validate() failures are safe. If command-service's drain decides not to run a command (validation failed), it dispatches an invalid commandResult. The per-command button transitions to its invalid state via the result event — no risk of the spinner sticking.
User-perceived flow after the fix (for any auto-executed command): streaming → Working… (existing preparing state) → spinner immediately → result. No Run button, no Accept All / Cancel bar.

Tests

Unit (tests/unit/lib/command-auto-execute-test.ts) — covers all three auto-exec branches plus the agent-ownership short-circuit.
Integration (tests/integration/components/ai-assistant-panel/commands-test.gts):
- checkCorrectness (always-auto-execute branch) → Accept All bar suppressed; asserted immediately and after settled().
- requiresApproval=false via the boxel-environment skill → Accept All bar suppressed.
- patchCardInstance (requires approval, control) → Accept All bar still renders.
- checkCorrectness → per-command button shows applying, not ready.

The LLM-mode-'act' branch is covered by the unit test; doing it end-to-end would need extra mode-state plumbing for diminishing returns since all three branches converge on the same predicate.

github-actions · 2026-06-18T07:55:03Z

Preview deployments

Host staging preview

Host production preview

Host Test Results

1 files 1 suites 1h 58m 43s ⏱️
3 123 tests 3 107 ✅ 15 💤 0 ❌ 1 🔥
3 142 runs 3 125 ✅ 15 💤 1 ❌ 1 🔥

Results for commit 1bd7797.

For more details on these errors, see this check.

Realm Server Test Results

1 files ± 0 1 suites ±0 12m 2s ⏱️ -3s
1 735 tests ± 0 1 735 ✅ ± 0 0 💤 ±0 0 ❌ ±0
3 461 runs +1 633 3 461 ✅ +1 633 0 💤 ±0 0 ❌ ±0

Results for commit 1bd7797. ± Comparison against earlier commit ad2e63d.

The manual Accept All / Cancel bar briefly flashed below an assistant tool call before command-service started auto-executing it. The drain loop decides "auto-execute" inside an async 100ms-debounced task, but the room's readyCommands getter only filtered by execution status — so during the debounce window the bar painted, then yanked itself when acceptingAllRoomIds flipped. Pull the auto-execute decision into a synchronous predicate (isAutoExecutableCommand) and call it from both command-service (to decide whether to run) and room.gts readyCommands (to decide whether to render the bar). Single source of truth for the three conditions — checkCorrectness, requiresApproval=false, LLM mode act. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

isAutoExecutableCommand has three branches (checkCorrectness, requiresApproval=false, LLM mode 'act') and the readyCommands filter treats them uniformly. Rename the existing integration test so it reads as "any always-auto-executed command" instead of a checkCorrectness-specific case, and add a second test that drives the requiresApproval=false branch via the boxel-environment skill. The LLM-mode='act' branch is still locked in by the unit test — exercising it end-to-end would need extra LLM-mode-state plumbing for diminishing returns since all three branches converge on the same filter line in readyCommands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The per-command Apply button (rendered next to each tool-call message) flashes through its 'ready' state — a clickable Run button — for the ~100ms debounce window before command-service starts the auto-execute run. Same root cause as the Accept All bar. Reuse isAutoExecutableCommand: when status is ready/undefined and the helper says this command will auto-execute, render the applying state (spinner) immediately. The button then transitions naturally to applied on completion. If validate fails in the drain, command-service emits an invalid commandResult and the button settles into its invalid state — no risk of the spinner sticking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI surfaced a regression on the per-command Apply button fix: Acceptance | Commands tests: ShowCard command added from a skill, is not automatically executed when agentId does not match A ShowCard command sent by another agent has requiresApproval=false on the skill, so isAutoExecutableCommand returned true and the button masked into the applying spinner — but command-service refuses to run it because of the agentId gate (drainCommandProcessingQueue line 354-357). The button would have stuck on the spinner forever. Thread isOwnedByCurrentAgent through the predicate and short-circuit to false when it's false. Callers compute it from `message.agentId === matrixService.agentId`. command-service passes `true` because its outer loop already short-circuits on the same condition before reaching the predicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The new isAutoExecutableCommand agent-ownership check fails closed when message.agentId doesn't equal matrixService.agentId. The new CS-11647 integration tests went through simulateRemoteMessage without setting `data.context.agentId`, so the helper short-circuited to false in the test environment and the Accept All bar / per-command ready button rendered for an unrelated reason. CI surfaced this on the two tests that depended on the helper actually returning true: not ok 121 ... Accept All bar does not flash for a requiresApproval=false command not ok 122 ... per-command Apply button does not flash Run before auto-execute starts Pass the current agent's id explicitly. The checkCorrectness Accept All test also gets the context (previously passing by accident because validate marks status=invalid for an ad-hoc checkCorrectness call, which the readyCommands filter already excludes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ng is stuck drainCommandProcessingQueue used to `continue` past a wedged room without emitting any commandResult, so the synthetic "applying" spinner in room-message-command.gts had no terminal event to fall through and hung forever. Dispatch an `invalid` commandResult for each auto-executable command on that path so the UI surfaces the invalidCommandState alert with a Try Anyway button. Also extract STUCK_PROCESSING_TIMEOUT_MS so tests can target the same threshold the prod path uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ing state While the synthetic "applying" spinner is up (auto-executable command between landing and drainCommandProcessingQueue starting the run task), the card's idle test attribute used to disagree with the apply button: the apply button reported applying, the card still reported idle. Drive both off applyButtonState so the two attributes always agree about whether the spinner is visible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nd drop ticket prefix - New unit-flavoured integration test exercising invalidateAutoExecutableCommandsForStuckProcessing directly: spies on matrixService.sendCommandResultEvent and asserts only the auto- executable command gets an invalid dispatch with the expected failureReason. Avoids stubbing the ember-resources proxy, which silently no-ops Object.defineProperty. - Add a coherence assertion that data-test-command-card-idle is omitted while the synthetic applying state is on (Glimmer drops attributes bound to falsy expressions, so the test selector is presence-based). - Drop the per-test CS-11647 prefix from titles now that the cluster is large enough to read on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes a UI flash where manual approval controls briefly appear for commands that the host will auto-execute, by centralizing the auto-execute decision into a shared predicate and using it consistently across the command drain loop and both UI surfaces.

Changes:

Introduces isAutoExecutableCommand() (and re-homes CHECK_CORRECTNESS_COMMAND_NAME) as a single source of truth for auto-execution eligibility.
Updates command draining + room UI to hide/mask manual approval UI immediately for auto-executable commands.
Adds unit + integration tests covering the three auto-exec branches plus the agent-ownership gate.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
packages/host/app/lib/command-auto-execute.ts	Adds shared auto-execution predicate and constant to prevent UI/logic drift.
packages/host/app/services/command-service.ts	Uses shared predicate for auto-execution; adds stuck-processing invalidation for auto-exec commands.
packages/host/app/components/matrix/room.gts	Filters out auto-executable commands from `readyCommands` to prevent Accept All bar flashing.
packages/host/app/components/matrix/room-message-command.gts	Masks per-command Apply button into immediate “applying” state for auto-executable commands.
packages/host/tests/unit/lib/command-auto-execute-test.ts	Unit coverage for predicate branches + agent-ownership short-circuit.
packages/host/tests/integration/components/ai-assistant-panel/commands-test.gts	Integration coverage ensuring approval UI doesn’t flash for auto-exec paths and still renders for manual approval.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+      let invokedToolFromEventId =
+        this.getCurrentEventIdForCommandRequest(
+          roomId,
+          messageCommand.commandRequest.id,
+        ) ?? messageCommand.eventId;
+      await this.matrixService.sendCommandResultEvent({
+        roomId,
+        invokedToolFromEventId,
+        toolCallId: messageCommand.commandRequest.id!,
+        status: 'invalid',
+        failureReason: `Room processing did not finish within ${Math.round(
+          STUCK_PROCESSING_TIMEOUT_MS / 1000,
+        )}s; command was not started`,
+        context: await this.operatorModeStateService.getSummaryForAIBot(),
+      });


+// How long drainCommandProcessingQueue waits for a room resource that's
+// still processing before giving up on the event. In tests we shorten this
+// so the stuck-timeout invalidation path can be exercised in a single test
+// without holding a real test open for a minute.
+const STUCK_PROCESSING_TIMEOUT_MS = isTesting() ? 1000 : 60_000;


FadhlanR force-pushed the cs-11647-accept-all-flash-before-check-correctness branch from 19f74e0 to 07dc010 Compare June 18, 2026 13:52

FadhlanR and others added 3 commits June 18, 2026 22:28

FadhlanR changed the title ~~fix(CS-11647): hide Accept All bar for auto-executed commands~~ fix(CS-11647): hide manual approval UI for auto-executed commands Jun 19, 2026

FadhlanR and others added 4 commits June 19, 2026 11:06

FadhlanR marked this pull request as ready for review June 19, 2026 09:27

FadhlanR requested a review from a team June 19, 2026 09:27

habdelra requested a review from Copilot June 19, 2026 12:55

Copilot started reviewing on behalf of habdelra June 19, 2026 12:59 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(CS-11647): hide manual approval UI for auto-executed commands#5273

fix(CS-11647): hide manual approval UI for auto-executed commands#5273
FadhlanR wants to merge 8 commits into
mainfrom
cs-11647-accept-all-flash-before-check-correctness

FadhlanR commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FadhlanR commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background and Goal

Where to start

Key decisions and non-obvious mechanics

Tests

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview deployments

Host Test Results

Realm Server Test Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FadhlanR commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading